{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# TSNE Demo\n",
    "\n",
    "[TSNE](https://lvdmaaten.github.io/tsne/) (T-Distributed Stochastic Neighborhood Embedding) is a fantastic dimensionality reduction algorithm used to visualize large complex datasets including medical scans, neural network weights, gene expressions and much more.\n",
    "\n",
    "cuML's TSNE algorithm supports both the faster Barnes Hut $ n logn $ algorithm and also the slower Exact $ n^2 $ .\n",
    "\n",
    "The model can take array-like objects, either in host as NumPy arrays as well as cuDF DataFrames as the input.\n",
    "\n",
    "For information about cuDF, refer to the [cuDF documentation](https://docs.rapids.ai/api/cudf/stable).\n",
    "\n",
    "For information on cuML's TSNE implementation: https://rapidsai.github.io/projects/cuml/en/stable/api.html#cuml.TSNE."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import gzip\n",
    "import matplotlib.pyplot as plt\n",
    "import numpy as np\n",
    "import os\n",
    "from cuml.manifold import TSNE\n",
    "\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Fashion MNIST\n",
    "\n",
    "We are going to work with the fashion mnist data set.\n",
    "\n",
    "This is a dataset consisting of 70,000 28x28 grayscale images of clothing. It should already be in the data/fashion folder, but let's first check!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "if not os.path.exists('data/fashion'):\n",
    "    print(\"error, data is missing!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Helper Functions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# https://github.com/zalandoresearch/fashion-mnist/blob/master/utils/mnist_reader.py\n",
    "def load_mnist_train(path):\n",
    "    \"\"\"Load MNIST data from path\"\"\"\n",
    "    labels_path = os.path.join(path, 'train-labels-idx1-ubyte.gz')\n",
    "    images_path = os.path.join(path, 'train-images-idx3-ubyte.gz')\n",
    "\n",
    "    with gzip.open(labels_path, 'rb') as lbpath:\n",
    "        labels = np.frombuffer(lbpath.read(), dtype=np.uint8,\n",
    "                               offset=8)\n",
    "\n",
    "    with gzip.open(images_path, 'rb') as imgpath:\n",
    "        images = np.frombuffer(imgpath.read(), dtype=np.uint8,\n",
    "                               offset=16).reshape(len(labels), 784)\n",
    "    return images, labels"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Load The Data\n",
    "Let's load up the fashion MNIST data!\n",
    "\n",
    "We can also visualize one fashion image (a handbag) which is of size 28 by 28"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "images, labels = load_mnist_train(\"data/fashion\")\n",
    "\n",
    "plt.figure(figsize=(5,5))\n",
    "plt.imshow(images[100].reshape((28, 28)), cmap = 'gray')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Reduce Dimensionality with TSNE\n",
    "\n",
    "Now, let's reduce the data from 28*28 dimensions to 2."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tsne = TSNE(n_components = 2, method = 'barnes_hut', random_state=23)\n",
    "%time embedding = tsne.fit_transform(images)\n",
    "\n",
    "print(embedding[:10], embedding.shape)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Visualize Embedding\n",
    "\n",
    "Let's visualize TSNE's embedding!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "classes = [\n",
    "    'T-shirt/top',\n",
    "    'Trouser',\n",
    "    'Pullover',\n",
    "    'Dress',\n",
    "    'Coat',\n",
    "    'Sandal',\n",
    "    'Shirt',\n",
    "    'Sneaker',\n",
    "    'Bag',\n",
    "    'Ankle boot'\n",
    "]\n",
    "\n",
    "fig, ax = plt.subplots(1, figsize = (14, 10))\n",
    "plt.scatter(embedding[:,1], embedding[:,0], s = 0.3, c = labels, cmap = 'Spectral')\n",
    "plt.setp(ax, xticks = [], yticks = [])\n",
    "cbar = plt.colorbar(boundaries = np.arange(11)-0.5)\n",
    "cbar.set_ticks(np.arange(10))\n",
    "cbar.set_ticklabels(classes)\n",
    "plt.title('Fashion MNIST Embedded via TSNE');"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
